AITopics | counterfactual invariance

Collaborating Authors

counterfactual invariance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8710ef761bbb29a6f9d12e4ef8e4379c-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 17:07:48 GMT

counterfactually invariant predictor, invariant risk minimizer, predictor, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.30)

Add feedback

8710ef761bbb29a6f9d12e4ef8e4379c-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 17:07:44 GMT

In machine learning, these have a know-it-when-you-see-it character; e.g., changing the gender of a sentence's subject changes a sentiment predictor's output. To check for spurious correlations, we can'stress test' models by perturbing irrelevant parts of input data and seeing if model predictions change. In this paper, we study stress testing using the tools of causal inference. We introduce counterfactual invariance as a formalization of the requirement that changing irrelevant parts of the input shouldn'tchangemodelpredictions.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Counterfactual Invariance to Spurious Correlations in Text Classification

Neural Information Processing SystemsDec-24-2025, 10:13:01 GMT

Informally, a'spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter. In machine learning, these have a know-it-when-you-see-it character; e.g., changing the gender of a sentence's subject changes a sentiment predictor's output. To check for spurious correlations, we can'stress test' models by perturbing irrelevant parts of input data and seeing if model predictions change. In this paper, we study stress testing using the tools of causal inference. We introduce counterfactual invariance as a formalization of the requirement that changing irrelevant parts of the input shouldn't change model predictions.

counterfactual invariance, name change, spurious correlation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Proofs

Neural Information Processing SystemsAug-15-2025, 16:19:09 GMT

This is essentially by definition--intervention on Z doesn't change the potential outcomes, so it doesn't change the value of f (X). If f is a counterfactually invariant predictor: 1. Let L be either square error or cross entropy loss. Suppose that the target distribution Q is causally compatible with the training distribution P . Suppose that any of the following conditions hold: 1. the data obeys the anti-causal graph 2. the data obeys the causal-direction graph, there is no confounding (but possibly selection), and the association is purely spurious, Y X | X We begin with the anti-causal case.

counterfactually invariant predictor, invariant risk minimizer, predictor, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.30)

Add feedback

Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests Victor V eitch 1,2, Alexander D'Amour 1, Steve Y adlowsky 1, and Jacob Eisenstein 1 1

Neural Information Processing SystemsAug-15-2025, 16:19:06 GMT

artificial intelligence, counterfactual invariance, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis

Ma, Haroui, Quinzan, Francesco, Willem, Theresa, Bauer, Stefan

arXiv.org Machine LearningApr-28-2025

Machine learning (ML) systems for medical imaging have demonstrated remarkable diagnostic capabilities, but their susceptibility to biases poses significant risks, since biases may negatively impact generalization performance. In this paper, we introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics. Our method leverages the concept of counterfactual invariance, measuring the extent to which a model's predictions remain unchanged under hypothetical changes to sensitive attributes. We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases without requiring direct access to counterfactual data. Through experiments on synthetic datasets and large-scale real-world medical imaging datasets, including \textsc{cheXpert} and MIMIC-CXR, we demonstrate that our approach aligns closely with counterfactual fairness principles and outperforms standard baselines. This work provides a robust tool to ensure that ML diagnostic systems generalize well, e.g., across demographic groups, offering a critical step towards AI safety in healthcare. Code: https://github.com/Neferpitou3871/AI-Alignment-Medical-Imaging.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2504.19621

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(12 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Wang, Chaoqi, Zhao, Zhuokai, Jiang, Yibo, Chen, Zhaorun, Zhu, Chen, Chen, Yuxin, Liu, Jiayi, Zhang, Lizhu, Fan, Xiangjun, Ma, Hao, Wang, Sinong

arXiv.org Artificial IntelligenceJan-16-2025

Recent advancements in large language models (LLMs) have demonstrated remarkable capabilities in generating coherent, contextually appropriate responses across a wide range of tasks (Brown et al., 2020). A key approach to further refine these models is Reinforcement Learning from Human Feedback (RLHF), which leverages human evaluations to guide the training process and align model outputs more closely with human preferences (Stiennon et al., 2020; Ouyang et al., 2022; Bai et al., 2022; Wang et al., 2024). RLHF typically involves training a reward model to capture human preferences, which is then used to fine-tune LLMs via reinforcement learning (RL) (Schulman et al., 2017; Chen et al., 2024b,f). Despite the success of RLHF, reward modeling is inherently prone to spurious correlations, which are associations in the training data that do not reflect true causal relationships (Veitch et al., 2021), and can lead to unintended biases and induce reward hacking (McMilin, 2022). Reward hacking occurs when RL agents exploit flaws or ambiguities in the reward function to maximize rewards without genuinely improving alignment with desired behaviors or completing designed tasks (Amodei et al., 2016; Weng, 2024). Consequently, this leads to misaligned models that exhibit biases such as favoring longer outputs (length bias) (Zheng et al., 2023), agreeing with user's incorrect assertions (sycophancy bias) (Perez et al., 2022), developing unintended shortcuts when making predictions (concept bias) (Zhou et al., 2023), and implicitly developing discrimination over certain demographic groups (discrimination bias) (Tamkin et al., 2023; Chen et al., 2024c). These biases, rooted in spurious correlations and reward hacking rather than true causal relationships, undermine the reliability and trustworthiness of LLMs, posing significant challenges for their safe and responsible deployment in real-world applications (Anwar et al., 2024; Qi et al., 2024). To understand and mitigate these issues, it is essential to consider the sources of error in reward modeling.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.0962

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Counterfactual Invariance to Spurious Correlations in Text Classification

Neural Information Processing SystemsJan-13-2025, 14:12:20 GMT

counterfactual invariance, spurious correlation, text classification, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.44)

Add feedback

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions

Cotta, Leonardo, Maddison, Chris J.

arXiv.org Artificial IntelligenceJun-11-2024

Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes decision-making. On the other hand, these models are still consistently making predictions that contradict users' or society's expectations, e.g., hallucinating, or discriminating. Thus, it is important that we develop test-time strategies to improve their trustworthiness. Inspired by prior work, we leverage causality as a tool to formally encode two aspects of trustworthiness in LLMs: fairness and robustness. Under this perspective, existing test-time solutions explicitly instructing the model to be fair or robust implicitly depend on the LLM's causal reasoning capabilities. In this work, we explore the opposite approach. Instead of explicitly asking the LLM for trustworthiness, we design prompts to encode the underlying causal inference algorithm that will, by construction, result in more trustworthy predictions. Concretely, we propose out-of-context prompting as a test-time solution to encourage fairness and robustness in LLMs. Out-of-context prompting leverages the user's prior knowledge of the task's causal model to apply (random) counterfactual transformations and improve the model's trustworthiness. Empirically, we show that out-of-context prompting consistently improves the fairness and robustness of frontier LLMs across five different benchmark datasets without requiring additional data, finetuning or pre-training.

arxiv preprint arxiv, llm, prediction, (12 more...)

arXiv.org Artificial Intelligence

2406.07685

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning Counterfactually Invariant Predictors

Quinzan, Francesco, Casolo, Cecilia, Muandet, Krikamol, Luo, Yucen, Kilbertus, Niki

arXiv.org Machine LearningOct-13-2023

Invariance, or equivariance to certain data transformations, has proven essential in numerous applications of machine learning (ML), since it can lead to better generalization capabilities [Arjovsky et al., 2019, Bloem-Reddy and Teh, 2020, Chen et al., 2020]. For instance, in image recognition, predictions ought to remain unchanged under scaling, translation, or rotation of the input image. Data augmentation, an early heuristic to promote such invariances, has become indispensable for successfully training deep neural networks (DNNs) [Shorten and Khoshgoftaar, 2019, Xie et al., 2020]. Well-known examples of "invariance by design" include convolutional neural networks (CNNs) for translation invariance [Krizhevsky et al., 2012], group equivariant NNs for general group transformations [Cohen and Welling, 2016], recurrent neural networks (RNNs) and transformers for sequential data [Vaswani et al., 2017], DeepSet [Zaheer et al., 2017] for sets, and graph neural networks (GNNs) for different types of geometric structures [Battaglia et al., 2018]. Many applications in modern ML, however, call for arguably stronger notions of invariance based on causality. This case has been made for image classification, algorithmic fairness [Hardt et al., 2016, Mitchell et al., 2021], robustness [Bühlmann, 2020], and out-of-distribution generalization [Lu et al., 2021]. The goal is invariance with respect to hypothetical manipulations of the data generating process (DGP). Various works develop methods that assume observational distributions (across environments or between training and test) to be governed by shared causal mechanisms, but differ due to various types of distribution shifts encoded by the causal model [Arjovsky et al., 2019, Bühlmann, 2020, Heinze-Deml et al., 2018, Makar et al., 2022, Part of this work was done while Francesco Quinzan visited the Max Planck Institute for Intelligent Systems, Tübingen, Germany.

artificial intelligence, counterfactual invariance, machine learning, (16 more...)

arXiv.org Machine Learning

2207.09768

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Law (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback